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STATUS OF CLAIMS 

Claims I - 1 0, 1 5- 1 8, 23-24 and 29 are pending, claims 1 and 29 being the sole 
independent claims. Claims 1 1- 14, 19-22, and 30 have been withdrawn as being drawn to 
non-elected inventions. 

REMARKS 

Claims 1-10, 15-18, 23-24 and 29 are pending and rejected. The Applicant has 
amended Claims 1-2, 4-5, 9, 15, 17, 24 and 29 and added new claims 31 and 32. No new 
matter has been added by amending the claims. Support for such amendments can be found 
throughout the specification and specifically on pages 3 , 4 and 8 of the specification. 

Objection to Claims 1-10. 23-24 and ?Q . informalities 

The objection to 1-10, 23-24 and 29 is made moot by the amendment to these claims 

herein. 



RECEIVED 
CENTr^tt3<'5§kTER 

AUG 1 3 2007 



Rejection Under 35 U.S.C. 1 12. first paragraph 

Claims 1-10, 15-18, 23-24 and 29 were rejected under 35 U.S.C. §112, first 
paragraph. The Applicant respectfully disagrees. 

A. Written Description requirement Rejection 

The Office rejected Claims 1-10, 15-18, 23-24 and 29 under 35 U.S.C. 112, first 
paragraph as allegedly failing to describe in any fashion the "physical and/or chemical 
properties of the claimed class of biosynthetic intermediates and/or metabolites thereof or 
class of ergostarol-biosynthetic enzymes including squalene epoxidase..." 

In order to allow further prosecution on the merits, the Applicant has amended Claims 
1, 24 and 29 to remove the phrase "and/or its biosynthetic intermediates and/or metabolites" 
and inserting instead specific biosynthetic intermediates and metabolites of ergosta-5,7- 
dienol. The intermediates and metabolites added to the claims do not constitute new matter 
because the scope of the amended claims is no broader than (and, thus, is supported by) the 
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specification as filed. Further support may be found on pages 23, lines 26-29 and page 24, 
lines 35-37. 

With respect to the Office's rejection to "class of ergostarol-biosynthetic enzymes 
including squalene epoxidase", the Applicant respectfully disagrees. 

The Applicant amended the claims to specifically identify the intermediates 
mevalonate, farnesyl pyrophosphate, geraniol phyrophosphate, squalene epoxide, 4- 
dimethylcholesta^.l^-trienol, 4,4-dimemyl*ymosterol, squalene, farnesol, geraniol, 
lanosterol, zymosterone and zymosterol and metabolites campesterol, pregnenolone, 17-OH 
pregnenolone, progesterone, 17-OH-progesterone, 11-deoxy Cortisol, hydrocortisone, 
deoxycorticosterone and/or corticosterone of ergosta-S,7-dienol. These enzymes are well 
known in the art to have specific biochemical reactivity, which is independent of the source 
and the detail structure. Furthermore, it is well established law that the written description 
doctrine (as it relates to biotechnology) requires "a precise definition of, such as by, 
structure, formula, chemical name, or physical properties, not a mere wish or plan for 
obtaining the claimed chemical invention." See Regents of the University of California vs Eli 
Lily & Co., 119 F.3d, 1559 (1566) (Fed.Cir. 1997). As stated above, the claims have been 
amended to claim the intermediates and metabolites of ergosta-5,7-dienol and thereby 
satisfying the written description requirement. 

Based upon these amendments, the Applicant respectfully requests that this rejection 
be withdrawn as it is made moot 

B. Enablement Rejection 

The Office rejected Claims 1-10, 15-18, 23-24 and 29 under 35 U.S.C. 112, first 
paragraph as allegedly not enabling the process of producing ergosta-5,7 dienol, or any 
biosynthetic intermediates and/or metabolites with a squalene expoxidase having 30% 
sequence identity to SEQ ID NO:8. The Applicant respectfully disagrees. However, to 
allow further prosecution on the merits, the Applicant has deleted reference to "30% 
sequence identity to SEQ ID NO: 8". The Applicant reserves the right to prosecute this 
invention in a later filed application. 
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The Office further indicated that the scope of the claims is not commensurate with the 
disclosure with regard to the "extremely large number of processes of producing the claimed 
compounds. The Applicant respectfully disagrees. 

In making the rejection, the Examiner has specifically contended that 

"S^S ° f 8 pr0tein de *nnines its structural and 
functional properties, predictability of which changes can be tolerated iTl 
protein's amino acid sequence and obtain the df sired «5, f vity ^4 a 

sequence ff any are tolerant of modification and which are conse^ed ( e 
SlT^! 10 modific ^), and detailed knowledge ofXwaysIn 
which the protems' structure relates to its function." (Office Action pagt ™ 

The first paragraph of 35 U.S.C. § 1 12 requires that the specification describe how to 
make and use the claimed subject matte, That requirement has been met in the present 
application. In particular, the specification describes how to make ergosta-5,7-dienol its 
mtermediates and metabolites. Throughout the specification there is mention of several 
different techniques in which to obtain the desired activity of A22-desaturase HM-CoA- 
reducfcse, squalene expoxidase activities. Specifically on pages 5-6 of the specification 
outline the various ways in which to reduce the expression of the nucleic acid encoding A22- 
desaturase as follows: 

a) £?£! UCing " u * ic .? cid Mq«nc«! which cm be transcribed into an 
amisense nuc eic acid sequence w hich is capable of tahfltftingfij Tt£ 

W S^S^ -.etc acid sequences, which 

d) introducing specific DNA-biading ftctors, for example footers of the zinc 



PAGE 1 1/29 * RCVD AT 8/1 3/2007 6:54:22 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-5/4 11 DNIS:2738300* CSID:908 518 7795 * DURATION (mm-ss):10-20 



Aug 13 2007 19=07 P. 12 



MAYER 8. WILLIAMS PC Fax:908-518-7795 

....... • ■ jjyr TTTTT 

U.S. Serial No.: 10/549,871 
Examiner: Mohammad Y. Meah 
GAU: 1652 
Page 11 of 14 

W a preferred embodiment of the method according to the invention, the gene 
expression of the nucleic acids encoding a A22.desan.rase is reduced by generating knockout 
mutants, especially preferably by homologous recombination. 

Likewise, on page pages 11-12 of the specification HMG^oA-reductase, squalene 
expoxidase activities, to name a few, identifies several methods in which to accomplish the 
desired actmty in order to produce ergosta-5,7-dienoI. For example: 

1) modifying the promoter DNA sequences, 

2> ex2S£ 8 ° r m ° difying g<aie CXpreSSi ° n by applyin S ^genous stimuli, for 
a) foreign substances, and 

b> Pr0tein <e ' 8 " d±MriC Pr ° tein) ** may be for genes 

Ihus, one skilled in the art would know how to produce such activities having at least 
at least 5%, more preferably to at least 20%, mote preferably to at least SO* more preferably 
to at least 100%, even more preferably to at least 300%, especially preferably to at least 
500%, in particular to at least 600% of the enzyme activity as compared to the wild type 
orgamsm. Therefore, regardless of the precise functiona. characteristics of the enzymes, one 

Z^ dwe the e ~ aotivit,es usins the discI6sufe provided * *• 

Furthermore, skilled artisans in molecular biology have determined that 30% identity 
is a reliable threshold for establishing evolutionary homology between two sequences aligned 
overaueast 150residu e , See, Brenner era/., "Asking sequence comparison methods with 
rehab e structurally identified distant evolutionary relations," p^. Nat , Acad . ^ ugA 

F ° r * e ^ — ^e Apphcant has provided herewith a 
copy of the Brenner et al. reference. 

C ^^ 17 ^.^ngothe re ,e mM h 0 dfor r ^ u ctoge rgoste . 5 ^ enoIa ^ 
ns .n«ennedia«e S and metoboH.es to an crganiam * m leased HMG-CoA reduce and 

r^rr^ ^ *• -"* «* <*° *. have 

^ roN ^^8,re^ ly ^ as ^ MtafaMI ^3^^ 

*. dadosed SEQ n> NO, 4 «d 8, respective.,. The Applied haa amended Clairea 9 and 
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17 and added new Claims 31 and 32 to fix the j . 

. . . A . ^ grammar and to more clearly identify the 

nucleic acids introduced into the organisms. 

As set forth in In re Marzocchi, 169 USPQ 367, 369 (CCPA 1971); 

illustrative examples or by brlao^ - « 

invention in tuH w£ch JSSL^ * ° f making *» d usin « 
*>d defining m2 oIh T t0 *T Uscd in describ *S 

compliance^* S? ^fT* mUSt be 45 *» 
unless there is reasoTto doubJX S 1 &S u P«W*«f$ "2 
contained therein which must be relied on for enabling su° 01/ Statements 

» -nate and ns. „ cited ^ Based upm ^ 

^-^.^^^^^ 

request that this rejection be withdrawn. 

M actioo of Cl a ims 1-8, ?V>1 - wd-» . m fa,w„ S r ijQ flj 

Clata* .-8, 15-1., 23-24 and 29 have been rejected under 35 U.S.C 102(b) as 
allegedly berag anticipated by Saundera e, al. (EP 0486290), hereinafler ^nndera" 

S cer^ ^^r ***"' ^ **- "«^.^«o. using vaio™ mutant 
S ~e y., Mth reduce expression of desahtrase, increase evasion of — 

activity . Tie Apphcan, respectfully disagrees for the following reasons 

sterols m yeas, cemp^ ^ing the expression level of a stiu«2 gen e ££l 
polypeptide having HMO-CoA reductase activity" Sea Saunde^ TWb Tt 
ease 2 t„™j„ j baunders, Technical Invention on 

page 2. Santera does no, teach rae reduction of desannase ^ ^ 
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reductase activity, while also increasing squalene expoxidase activity (nor does Saunders 
teach the activities of lanosterol C 14-methylase or squalene synthetase activities). In 
addition, the International Preliminary Examination Report ("IPER") for the priority PCT 
application differentiates the present application from that of Saunders, 

"[Saunders] is considered to be the prior art closest to the subject matter of 
claim 1 and the subject matter of claim 1 therefore differe from Dl in that the 
method is implemented using organisms which, by comparison with the 
present claims, additionally display an increased activity of a further enzyme 

Si^SS? ^f'** *° m ^ g™p comprising lanosterol-Cl^ 

demediylase (ergll), squalene [e]xpoxidase (ergl) and squalene synthetase 

See 4.2 of IPER, emphasis added. 

Therefore, Saunders does not teach every element of the claimed invention. Based 
upon this difference, Saunders does not anticipate the present application. The Applicant 
respectfully requests that this rejection be withdrawn. 

Rejection of Claims 1 -ft 1 5-16. 23-24 anH 29 U.S.C 1 07(h) 

Claims 1-8, 15-16, 23-24 and 29 have been rejected under 35 U.S.C. 102(b) as 
allegedly being anticipated by Weber et al. (WO-99/16886), hereinafter "Weber". 

As the Office indicated, Weber discloses "the method of making ergosta-5,7-dienol 
using various mutant S. cerevisiae strain with reduce expression of desaturase and increased 
expression of squalene expoxidase and HMG-CoA-reductase". The Applicant respectfully 
disagrees for the following reasons. 

Weber discusses a method of "producing ergosterol and intermediate products thereof 
by means of recombinant yeast and plasmids for transforming yeasts" See Saunders 
Abstract on page 1. The Office indicated that Weber teaches a reduction in expression of 
desaturase. After review of the Weber reference, it does not appear to disclose a reduced 
desaturase activity, which is one of the elements of the presently claimed invention It is 
respectfully requested that the Office indicate the specific location of such statement Based 
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ABSTRACT Paicwfc* sequence comparison methods have 
been assessed using proteins whose relationships are known 
reliably from their structures and functions, as described in 
the SCOP database [Motrin, A. G., Brenner, S. Hubbard, T. 
& Chothia C (1995) J. MoL BioL 247, 53«r540]. The evalua- 
tion tested the programs blast [Altschul, $. F., Gfeh, W*, 
Miller, Myers, E. W. & Lipman, D, J, (1990). jr. MoL BioL 
215, 4t&-4l0), wu-wutm [Altschul, S. F. & Gish, W, (1996) 
Methods En^ymol. 266, 460-480], fa$ta [Pearson, W. R. & 
Upman, D. X (1988) Proc. Natl. Acad. Sci. OSA 85, 2444-244S], 
and $$EAjtot [Smith, T. F, & Waterman, M. S. (1981)/. Moi. 
Jb'af. 147, 195-197] and their scoring schemes. The error rate 
of alt algorithms is greatly reduced by using Statistical scores 
to evaluate matches rather than percentage identity or raw 
scores. The E- value statistical scores of sseaach and fast a are 
reliable: the number of false positives found in our tests agrees 
well with the scores reported. However, the P-values reported 
by blast and wu-blaj>T2 exaggerate significance by orders of 
magnitude. SSEARCB, FAST A, ktnp » I, and Wt>BLA5T2 perform 
best, and they are capable of detecting almost all relationships 
between proteins whose sequence identities are >30%. For 
more distantly related proteins, they do much less well; only 
one-naif of the relationships between proteins with 20-30% 
identity are found. Because many homologs have low sequence 
similarity, most distant relationships cannot be detected by 
any pairwise comparison method; however, those which are 
identified may be used with confidence. 



Sequence database searching plays a role in virtually every 
branch of molecular biology and is crucial for interpreting the 
sequences issuing forth from genome projects. Given the 
method's central role, it is surprising that overall and relative 
capabilities of different procedures are largely unknown. It is 
difficult to verify algorithms on sample data because this 
requires large data sets of proteins whose evolutionary rcla- 
tionshipa are known unambiguously and independently of the 
methods being evaluated. However, nearly all known ho- 
mologs have been identified by sequence analysis (the method 
to be tested). Also, it is generally very difficult to know, in the 
absence of structural data, whether two proteins that lack clear 
sequence similarity are unrelated. This has meant that al- 
though previous evaluations have helped improve sequence 
comparison, they have suffered from insufficient, imperfectly 
characterized, or artificial test data. Assessment also has been 
problematic because high quality database sequence searching 
attempts to have both sensitivity (detection of homologs) and 
specificity (rejection of unrelated proteins); however, these 
complementary goals arc linked such that increasing one 
causes the other to be reduced. 
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Sequence comparison methodologies have evorved rapidly, 
so no previously published tests has evaluated modern versions 
of programs commonly used. For example, parameters in 
blast (1) have changed, and WU-BLAST2 (2)-^which produces 
gapped alignments— bas become available. The latest version 
of FaSTa (3) previously tested was 1.6, but the current release 
(version 3.0) provides fundamentally different results in the 
form of statistical scoring. 

The previous reports also have left gaps in our knowledge. 
For example, there has been no published assessment of 
thresholds for scoring schemes more sophisticated than per- 
centage identity. Thus, the widely discussed statistical scoring 
measures have never actually been evaluated on large data- 
bases of real proteins. Moreover, the different scoring schemes 
commonly in use have not been compared. 

Beyond these issues, there is a more fundamental question: 
in an absolute sense, how well docs pairwise sequence com- 
parison work? That is, what fraction of homologous proteins 
can be detected using modem database searching methods? 

In this work, we attempt to answer these questions and to 
overcome both of the fundamental difficulties that have hin- 
dered assessment of sequence comparison methodologies. 
First, we use the set of distant evolutionary relationships in the 
scop: Structural Classification Of Proteins database,^), which 
is derived from structural and functional characteristics (5). 
The scop database provides a uniquely reliable set of ho- 
mologs, which are known independently of sequence compar- 
ison. Second, we use an assessment method that jointly mea- 
sures both sensitivity and specificity. This method allows 
straightforward comparison of different sequence searching 
procedures. Further, it can be used to aid interpretation of real 
database searches and thus provide optimal and reliable 
results. 

Previous Assessments of Sequence Comparison. Several 
previous studies have examined the relative performance of 
different sequence comparison methods. The most encom- 
passing analyses have been by Pearson ($, 7), who compared 
the three most commonly used programs. Of these, the Smith- 
Waterman algorithm ($) implemented in ssearch (3) is the 
oldest and slowest but the most rigorous. Modern heuristics 
have provided blast (1) the speed and convenience to make 
it the most popular program. Intermediate between these two 
is FaSTa (3)» which may be run in two modes offering either 
greater speed (ktup = 2) or greater effectiveness (ktup = 1). 
Pearson also considered different parameters for each of these 
programs. 

To test the methods, Pearson selected two representative 
proteins from each of 67 protein superfamilies defined by the 
pir database (9). Each was used as a query to search the 
database, and the matched proteins were marked as being 
homologous or unrelated according to their membership of pir 



Abbreviation: EPQ. errors per query. 

tPrcscm address: Department of Structural Biology, Stanford Uni- 
versity, Fairchitd Building D-109, Stanford, CA 94305-5126 
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hyper.sianford.edu. 



6073 



PAGE 16/29 * RCVD AT 8/13/2007 6:54:22 PM [Eastern Daylight Time] " SVR:USPTO-EFXRF-5/4 ' DNIS:2738300 * CSID:908 518 7795 * DURATION (mm-ss):10-20 



MAYER 8. WILLIAMS PC 



Fax:908-518-7795 



Aug 13 2007 19:09 P. 17 



6074 Biochemistry: Brenner « «/. 

superfamilies. Pearson found that modern matrices and "bi- 
scaling" of raw scores improve results considerably. Ke also 
r f p ? r ! e l that tb f ri 8 orous Smith-Waterman algorithm worked 
sightly better than pasta, which was in turn more effective 
than blast. 

large scale analyses of matrices have been performed 
(10), and Henikoff and Henikoff (11) also evaluated the 
effectiveness of blast and Fasita, Their test with biast 
considered the ability to detect homologs above a predeter- 
mined score but had no penalty for methods which also 
reported large numbers of spurious matches. The Henikofft 
searched the swiss-prot database (12) and used PRosrrs (13) 
to define homologous families. Their results showed that the 
BLOSUMS2 matrix (14) performed markedly better than the 
extrapolated PAM-series matrices (15), which previously had 
been popular. 

A crucial aspect of any assessment is the data that are used 
to test the ability of the program to find homologs. But in 
Pearsons and the Heoikoffc' evaluations of sequence com- 
parison, the correct results were effectively unknown. This is 
because the superfamilies in pir and prosite arc principally 
created by using the same sequence comparison methods 
which are being evaluated, Interdependency of data and 
methods creates a "chicken and egg" problem, and means for 
example, that new methods would be penalized for correctly 
identifying homologs missed by older programs. For instance 
onmunoglobubn variable and constant domains are clearly 
homologous, but pir places them in different superfamflics. 
m^S^L" wl f e5 P read: ,?f ch stiperfamuy in p lR 48.00 with 
a structural homolog «. itself homologous to an average of 1 6 
other PIR superfamilies (16). 8 

rfJnT^T ^ff wrts of Sander and Schnei- 

f^w? P !? tcin amctures to sequence com- 

^ different sequence compari- 

ttfZ,^ T k f0cused 011 fining * leS- 
Sll^ of percentage identity, above Which all 
proteins would be of simdar structure, A result of this analysis 
was the hssp equauoa; it states that proteins with 25% identity 
over SO residues will have simUar sti-uctures, whereas shorted 
alignments require higher identity. (Other studies also have 

KErr - (W - 2 ?' but fo ^ sed o° a **M member 
of model proteins and were principally oriented toward eval- 
uaung alignment accuracy rather than homology detection.) 

A general solution to the problem of scoring comes from 
— E "? lues P-«uu«) based on Te 
TJ?Z^ *8tribut,on (21). Extreme value scoring was 
unp emented analytically in the blast program using the 

2£2J2u ^ sta,i$tics ^ 23 > *"<* empirical ap- 

t^^^^% heraWc ? as a reHao,e racan * °* recognizing 
fM^r r r ^ at , pt0tClns (24 ' 25 >- niathemaheM trac! 

fflttSWSf 1 'r,™ ft™** fcatU * ot th * "LAST 
SE^ii!* J He ^ ldtf y, of *Ws scaring procedure has been 
rrf aa ™^y 2 «d references in 

„ Z4 >- "fwever, all large empirical tests used random 
sequences that may lack the subtle structure found wZ 

^ aith0i L sh maoy researchers have sug 
gated that statistical scores be used to rank matches (24, 25 

S^rior. dCtCrmme 01,5 d *Sree to which such rankings arc 

A Database for Testing Homology Detection. Since the 
discovery ( thai .the structures of hemoglobin and myoglobin are 
very Slm ,l ar though their sequences Ire „ot (SiTKa 
apparent that comparing structures is a more powerful fif less 
convenient) way to recognize distant evolutionary rehtion- 
staps than companng sequences. If two proteins show a hieh 
degree of simdarity i 0 th cjr stfuctura] detaj)$ and functi * jt 



Ptoc. Nati, Acad. ScL USA 95 (1998) 

is very probable that they have an evolutionary relationship 
though their sequence similarity may be low 

The recent growth of protein structure information com- 
bined with the comprehensive evolutionary classification in 
the scop database (4, 5) have allowed us to overcome previous 
limitations. With these data, we can evaluate the performance 
of sequence comparison methods on real protein sequences 
whose relationships are known confidently. The SCOP database 
uses structural information to recognize distant homologs. the 
large ^majority of which can be determined unambiguously. 
V^ZTffu™^ ? uch * S tobiM 0r immunlglobu- 
uarit? commumt y deSDitc the of high sequence sim- 

From SCOP, we extracted the sequences of domains of 

K»«?A he ?° t " n Bank < roB > < 30 > and CTea »«d two 
databases. One (pdb90D-b) has domains, which were all <9Q% 

identica to any other, whereas (pdiwod-b) had those <AMh 
identical. The databases were created by first sorting all 
prwem domains in scop by their quality and making a listftne 
dife^* d ° m ^ Ms ***** ' *» inclusion m ^ 
(and discarded) were all other domains above the threshold 

^l,fH ldC, !?^ to ,. the Se,Cctcd domain - ™» PrS 
ffita? i"5S 2* "- St WaS "W- m PD^B database 
contains 1,323 domains, which have 9,044 ordered pairs of 
distant relationships, or -05% of the total 1,749.006 ordered 
pans. In pdbmm, the 2,079 domains have 53,9*8 relation? 

^T.n»^ atMi l 2% 0f 411 complexity regions 

of sequence can achieve furious high scoreTso these were 
maaked m both databases by processing withtiU SEG 
(27) using recommended parameters: 12 1^ 2.0. The datab^w 

and^?^ "h 8Vai i^ C fTOn http://sss.stanfbrd.edu/ 
S5S/, and databases derived from the current version of scop 
may be found at^tp://sc<*^rc-lmb.com.ac.uRSp/ 
■nifSS? darabases generaDy consistent, but 

PDB4W-B focuses on distantly related proteins and reduces Ac 

^vt, ( ^., 32) /- Wh !S eas pr>8WI>B (wi» more sequence^ 
wJT^ ffil? k^, 81 ^ • ^ wh ° re «otcd other. 
' ">e distant homolog results here are from pdb^oo-b 

rZ&T^"^ here arespeeific tothe 
gSl oatabaws used, we expect the Wds to be 

Assessment Data and Procedure. Our assessment of se- 

W;.!* teste - F£fst > "SmS J"St a siRSic sequence comp^ 
^algorithm at a Ome, we evaluated thelffectivenesVof 

o ' ^^^L Khtm ^\ S ^ oad ' wc *« lability 

«f ^ ^ ure V cludln S 80 valuation of the validiw 

of riatistical scoring. Third, we compared sequence compart 
son algorithm* (using dhe optimal scoring scneme) to 
^^J^ 11 !^ hve P^ormaace. Fourth, we examined the 
^tnbunon of homologs and considered thepower of pahwfat 

Srnl^ 10 ""•"^ theni - ^of me ana^ 
^ ^J^t™ 01 *™* 0 ^ id «tmed homologs aid a 
new assessment cntcnon. 

^^^^ tes, ^ Bl ^ ST (1). version 1.4.9MP, and wtj. 
BLAST2 (2), version 2.0al3MP. Also assessed was he fasta 

™™ ,m P ,,! menta,ion of Smith-Waterman (8) For 

*wm«T ( ' ^^T 11 * defaul1 Parameters and matrix (blo- 

SU ^L W) * CIB USed f0r and WU-BLAST2. 

The ^Coverage Vs. Error" Plot. To test a particular protocol 
frZ?^ m6 .\ pr ° Sram and * x>lin S ^emeX^chseq^enw 
tevWded T UScd f a 10 the daC 
Th! $ y ,eided ordered pans of query and target sequences with 

score,, from best to worst The ideal method would have 
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Smith-Waterman Scoring Sdttmts (POBSQD-9) 
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of pdb«d-8 database. All of the proteins in the database, wore compared with each other xuins the ssearch oroaram The ™7lt» rh£ 

for aatb ,eai scores raw jcores, and three measures using percentage ideality. In the coverage vs. error plot, their axU S^Mfr^Snof 
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perfect separation, with all of the homoiogs at the top of the 
list and unrelated proteins below. In practice, perfect separa- 
tion is impossible to achieve so instead one is interested in 
drawing a threshold above which there are the largest number 
of related pairs of sequence? consistent with an acceptable 
error rate* 

Our procedure involved measuring the coverage and error 
for every threshold. Coverage was defined as the fraction of 
structurally determined homoiogs that have scores above the 
selected threshold; this reflect* the sensitivity of a method 
Errors per query (EPQ), an indicator of selectivity, is the 
number of nonhomologous pairs above the threshold divided 
by the number of queries. Graphs of these data, called 
coverage vs. error plots, were devised to understand how 



protocols compare at different levels of accuracy. These 
graphs share effectively all of the beneficial features of Re» 
Clever Operating Characteristic (ROC) plots (33, 34) but 
better represent the high degrees of accuracy required in 
sequence comparison and the huge background of aonho- 
mologs. 

This assessment procedure is directly relevant to practical 
sequence database searching, for it provides precisely the 
information necessary to perform a reliable sequence database 
search. The E3PQ measure places a premium on score consis- 
tency; that is, it requires scores to be comparable for different 
queries. Consistency is an aspect which has been largely 



Percem Identity of Unrelated Protein* <PDB90D-e) 




peroem identity of an aBgnmant 
&etvtt«tfc.twa ixireJatad.pratehs . , 



Memootobln frcnam Clndsfa) Cellule Eft (limy 
list* $rarc*rDwc«Uiw--uvwtt W 

i. it. . .til i, i • « • 
1&nL WVTAL«8AA<lAWIPILWV>CAPCn OCQNHASCCA P^Y-RSWXOBPAACLra 

Fte. 2. Unrelated proteins with high percentage identify. Hemo- 
globin /3~Chain {?DB code 3hds chain b, ref. 38, Left) and cellulose E2 
(foa code 1 tml, rcf. 39. Ri^ht) have 39% identity over 64 residues, & 
level which is often believed to be indicative of homology. Despite this 
high degree of identity, their structures sttoagly suggest that these 
protein^ arc not related. Appropriately, neither the raw alignment 
score of 85 nor the E^aluc of 1.3 is significant. Proteins rendered by 




100 

Alignment length 



1S0 



200 



Fie 3. Length and percentage identity of alignments of unrelated 
proteins in m>bk>z>-b: Each pair of nonhomologous proteins found with 
S3UARCH is plotted as a point whose position indicates the length and 
the percentage identity within the alignment Because alignment 
length and percentage identity are quantized, many pairs of proteins 
may have exactly the same alignment length and percentage Identity 
The hne shows the hssp threshold (though it is intended to be applied 
with a different matrix and parameters). 
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Ratability of Statistical Scores {PDB90D-6} 




0.01 0.1 

EiP&rsPerOuCry 

Flo. 4. Reliability of Statistical scores in PD&yoj^B: Each line show* 
the relationship between reported statistical score and actual error 
rate for a different program. E-vaJucs are reported for 5 SEARCH and 
Pasta, whereas P-values arc shown tor blast and WU-BLAST2. If the 
scoring were perfect, then the number of errors per Query and the 
&vnlues would be the game, as indicated by the upper bold line. 
(P-values should be the same as BPQ for small numbers, aad diverges 
at higher values, as indicated by the lower bold line.) E-values from 
ssearch and pasta are shown to have good agreement with EPQ but 
underestimate the significance slightly, blast and wy-w,Asn are 
overconfident, with the degree of exaggeration dependent upon the 
score. The results for pdb^b were similar to those for fdrsod-b 
despite the difference in number of hornolog* detected. This graph 
could ho used to roughly calibrate the reliability of a given statistical 
Score. 

ignored in previous tests but is essential for the straightforward 
or automatic interpretation of sequence comparison results, 
further, it provides a clear indication of the confidence that 
should be ascribed to each match. Indeed, the EPQ measure 
should approximate the expectation value reported by data- 
base searching programs, if the programs* estimates are accu- 
rate. 

The Performance of Scoring Schemes. All of the programs 
tested could provide three fundamental types of scores. The 
first score is the percentage identity, which may be computed 
in several ways based on either the length of the augnmem or 
the lengths of the sequences. The second is a "raw** or 
"Smitn-Waterrnan" score, which is the measure optimized by 
the Smith-Waterman algorithm and is computed by summing 
the substitution matrix scores for each position in the align- 
ment and subtracting gap penalties. In blast, a measure 

i Algorithms (PDQ40Q-B) 
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related to this score is scaled into bits. Third is a statistical 
score based on the extreme value distribution These results 
arc summarized in Fig. I. 

Sequence Identity. Though it has been long established that 
percentage identity is a poor measure (35), there is a effluxion 
rule-of-thumb stating that 30% identity signifies homology. 
Moreover, publications have indicated that 25% identity can 
be used as a threshold (17, 36). We find that these thresholds, 
originally derived years ago* are not supported by present 
results. As databases have grown, so have the possibilities for 
chance alignments with high identity; thus, the reported cutoffs 
lead to frequent errors. Pig. 2 shows one of the many pairs of 
proteins with very different structures that nonetheless have 
high levels of identity over considerable aligned regions. 
Despite the high Identity, the raw and the statistical scores for 
such incorrect matches are typically not significant. The prin- 
cipal reasons percentage identity does so poorly seem to be 
that it ignores information about gaps and about the conser- 
vative or radical nature of residue substitutions. 

From the pdbsod-b analysis in Fig. 3, we learn that 30% 
identity is a reliable threshold for this database only for 
sequence alignments of at least 150 residues. Because one 
unrelated pair of proteins has 43.5% identity over 62 residues, 
it is probably necessary for alignments to be at least 70 residue* 
in length before 40% is a reasonable threshold, for a database 
of this particular size and composition. 

At a given reliability, scores based on percentage identity 
detect just a fraction of the distant homologs found by 
statistical scoring. If one measures the percentage identity in 
the aligned regions without consideration of alignment length, 
then a negligible number of distant homologs axe detected. 
Use of the hssp equation improves the value of percentage 
identity, but even this measure can find only 4% of all known 
homologs at 1% EPQ, In short, percentage Identity discards 
most of the information measured in a sequence comparison. 

Raw Scores. Smith- Waterman raw scores perform better 
than percentage identity (Fi|. 1), but In-scaling (7) provided no 
notable benefit in our analysis. It is necessary to be very precise 
when using either raw or bit scores because a 20% change in 
cutoff score could yield a tenfold difference in EPQ. However, 
it is difficult to choose appropriate thresholds because the 
reliability of a bit score depends on the lengths of the proteins 
matched and the size of the database. Raw score thresholds 
also are affected by matrix and gap parameters. 

Statistical Scores. Statistical scores were introduced partly 
to overcome the problems that arise from raw scores. This 
scoring scheme provides the best discrimination between 
homologous proteins and those which arc unrelated. Most 



Sequence Comparison Algorithms (PD&90D-B) 
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t^'J^iJ^^t^^ °f ^ rc,u **S«ncc comparison methods: Five different sequence comparison methods are evaluated each 

* ~ SH' " 1 and wu_eLJXSX2 are almost as good. (*) PDBMd-b database. The quick Wu-bi,ast2 DroWam 

at 1% EPQ on th» database, although at higher levels of error it heeomc* alightly^orse IdffwSX ™JTttE££ ^ 
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likety, its power can be attributed to its incorporation of more 
information than any other measure; it takes account of the 
full substitution and. gap data (like raw scores) but also has 
details about the sequence lengths and composition and i$ 
seated appropriately. 

We find that statistical scores are not only powerful, but also 
easy to interpret, ssearch and fa$ta show close agreement 
between statistical scores and actual Dumber of errors per 
query (Fig. 4). The expectation value score gives a good, 
slightly conservative estimate of the chances of the two se- 
quences being found at random in a given query. Thus, an 
E-value of 0.01 indicates that roughly one pair of nonhomologs 
of this similarity should be found in every 100 different queries. 
Neither raw scores nor percentage identity can be interpreted 
in this way, and these results validate the suitability of the 
extreme value distribution for describing the scores from a 
database search. 

The P-vahies from blast also should be directly interpret- 
able but were found to overstate significance by more than two 
orders of magnitude for 1% EPQ for this database. Nonethe- 
less, these results strongly suggest that the analytic theory is 
fundamentally appropriate. wu-bla$T2 scores were more re- 
liable than those from blast, but also exaggerate expected 
confidence by more than an order of magnitude at 1% EPQ. 

Overall Detection of Homology and Comparison of Algo^ 
rithms. The results in Fig. 5A and Table I show that pairwise 
sequence comparison is capable of identifying only a small 
fraction of the homologous pairs of sequences in pdb4od-b. 
Even ssearch with B-valucs, the best protocol tested, could 
find only \%% of all relationships at a 1% EPQ. blast, which 
identifies 15%, was the worst performer, whereas fasta 
ktup » 1 is nearly as effective as search. Fasta ktup = 2 and 
wu-bla$t2 arc intermediate in their ability to detect ho- 
mologs. Comparison of different algorithms indicates that 
those capable of identifying more homologs arc generally 
Slower. SSEARCH is 25 times slower than BUST and 6.5 times 
slower than Fa$Ta ktup = 1. WV-BU&n is slightly faster than 
fasta ktup - 2» but the latter has more interpretable scores. 

In pdb90D«b, where there arc many close relationships, the 
best method can identify only 38% of structurally known 
homologs (Fig. 53). The method which finds that many 
relationships is wu-BLAST2. Consequently, we infer that the 
differences between fasta kup = 1, ssearch, and wu-blast* 
programs are unlikely to be significant when compared with 
variation in database composition and scoring reliability. 

Fig. 6 helps to explain why most distant homologs cannot be 
found by sequence comparison: a great many such relation- 
ships have no more sequence identity than would be expected 
by chance, ssearch with lvalues can recognize >90% of the 
bomoiogoaa pai« with 30-40% identity, in this region, there 
are 30 pairs of homologous proteins that do not have signif- 
icant E-valucs, but 26 Of these involve sequences with <SQ 
residues. Of sequences having 25-30% identity, 75% are 
identified by ssearch E-values. However, although the num- 
ber of homologs grows at lower levels of identi ty, the detection 
falls off sharply: only 40% of homologs with 20-25% identity 



Pw. Natl Acad. Sci. USA 95 (1998) 6077 
Distribution and Detection of Homologs (P0840D-B) 
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Fio. 6\ Distribution and detection of homology in pdbwd-b. Bars 
show the distribution of homologous pairs pd&4m>b according to their 
identity (using the measure of identity in both). Paled regions indicate 
the number of these pairs found by the best database searching method 
(ssearch with E-values) at 1% EPQ. The pDb«d-b database contains 
proteins with <W% identity, and as shown on this graph, most 
structurally identified homologs in the database have diverged ex- 
tremely far in sequence and have <20% identity. Note that the 
alignments may be inaccurate, especially at low levels of identity, filled 
regions show that ssearch can identify most relationships that have 
25% or more identity, but its detection wanes sharply below 25%. 
Consequently, the great sequence divergence of most structurally 
identified evolutionary relationships effectively defeats the ability of 
pariwise sequence comparison to detect them. 

are detected and only 10% of those with 15-20% can be found. 
These results show that statistical scores can find related 
proteins whose identity is remarkably low; however, the power 
of the method is restricted by the great divergence of many 
protein sequences. 

After completion of this work, a new version of pairwise 
blast was released: blastop (37). It supports gapped align- 
ments, like wu-blasti, and dispenses with sum statistics. Our 
cutal tests on blastgp using default parameters show that its 
E-values arc reliable and that its overall detection of homologs 
was substantially better than that of ungapped blast, but not 
quite equal to that of WU-3I.AST2. 

CONCLUSION 

The general consensus amongst experts (sec refs. 7, 24, 25, 27 
and references therein) suggests that the most effective se- 
quence searches are made by (/) using a large current database 
m which the protein sequences have been complexity masked 
and (ii) using statistical scores to interpret the results. Our 
experiments fully support this view. 

Our results also suggest two further points. First, the E-val- 
ues reported by fasta and ssearch give fairly accurate 
estimates of the significance of each match, but the P-valucs 
provided by bcast and wu-blast2 underestimate the true 



Tablet. Summary of sequence comparwoa method* with PDB4W-B 


Method 


Relative Time* 


1%EFQ Cutoff 


Coverage* at 1% EPQ 


^SEARCH % idcmjty; within alignment 
SSEARCH % identity: within both 
ssearch % identity: HSSP-scaled 
ssearch Smfrh-Waterman raw scores 
ssearch E -values 
fasta letup to 1 E- values 
RaSTa ktup - 2 E-vaJutt 
WU-BLwm P-valae» 
8 la st P-valucs 


25.5 
25.5 
25.5 
25.5 
25.5 
3.9 
1.4 
1.1 
L0 


>70% 
34% 

35% (hsj* + 9.8) 
142 
0.03 
0.03 
0.03 
0.003 
0.00016 


«u 

3.0 
4.0 
105 
18.4 
17.9 
16.7 
17.5 
14.S 


'Times are from large database searches with genome proteins. " ~ — ' — " — : 
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extent of errors. vSecomi, SSEarch, WU-BLAST2, and FaSta 
ktup - 1 perform best, though blast and FASTa ktup = 2 
detect most of the relationships found by the best procedures 
and arc appropriate for rapid initial searches. 

The homologous proteins that are found by sequence com- 
parison can be distinguished with high reliability from the huge 
number of unrelated pairs. However, even the best database 
searching procedures tested fail to find the large majority of 
d^tant evolutionary relationships at an acceptable error rate. 
Thus, if the procedures assessed here £ai] to rind a reliable 
match, it does not imply that the sequence is unique; rather it 
indicates that any relatives it might have are distant onesA 

♦♦Additional and updated information about this work, indudine 
supplementary figures, maybe found at http://sss.stan fard.edu/s59A 
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3.2 



Reference to the following documents: 

Dl: EP-A-0 486 290 (AMOCO CORP) 

20 May 1992 (1992-05-20) 
D2: DE 197 44 212 A CSCHERIH6 AG) 

15 April 1999 (1999-04-15) . 

The present application fails to satisfy the 
requirements of PCT Article 33(2) because the 
subject matter of claims 1-30 lacks novelty (PCT 
Article 33(2) ) . 

Document Dl discloses (the references in brackets 
are to said document) a method for the production 
of * variety of sterols which can be regarded as 
intermediate and/or resultant products of ergosta- 
5,7-dienol, said method involving the cultivation 
of organisms which have a reduced A22-desaturase 
(ergS) activity and an increased HMG-CoA-reductase 
activity relative to the wild type (see claim 13) . 

Document D2 discloses a method for the production 
of a variety of sterols which can be regarded as 

/... 
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intermediate and/or resultant products of 
ergosta-5,7-dienol, said method involving the 
cultivation of organisms which have an increased 
HMG-CoA-reductase activity and increased squalene 
poxidase (ergl) activity relative to the wild type 
(see claim 2.a-iv)). 

The present application fails to satisfy the 
requirements of PCT Article 33(1) because the 
subject matter of claims 1-30 does not involve an 
inventive step (PCT Article 33(3)). 

Document Dl is considered to be the prior art 
closest to the subject matter of claim 1 and the 
subject matter of claim 1 therefore differs from 
Dl in that the method is implemented using 
organisms which, by comparison with the present 
claims, additionally display an increased activity 
of a further enzyme, which enzyme can be selected 
from the group comprising lanosterol-C14- 
demethylase (ergll), squalene poxidase (ergl) and 
squalene synthetase (erg9) . 

The problem addressed by the present invention can 
consequently be regarded as that of providing an 
alternative method for producing ergosta-5-7- 
dienol (and/or biosynthetic intermediate and/or 
resultant products thereof) . 
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4.4 The solution proposed in claims 1-10 and 15-18 of 
the present application cannot be considered 
inventive <PCT Article 33(3)) for the following 
reasons: 

as stated in point 3*2 above, D2 discloses (see 
claim 2.a-iv)) a method for the production of 
ergosta-5-7-dienol and/or biosynthetic 
intermediate and/or resultant products thereof by 
means of an increased t-HMG and ergl activity. 
A person skilled in the art could therefore arrive 
at the solution to the present problem by 
combining the disclosures of Dl and D2, without 
unreasonable experimental input. 

4.5 Moreover, the problem cannot be considered to have 
been solved for the entire scope of protection 
claimed in claim 1: (i) it has not been shown that 
the problem has been solved for all intermediate 
and/or resultant products of ergosta-5, 7-dienol. 
Tables 2 and 3 (data for s. cerevisiae 
GRFtHIura3ERGlerg5) show the decrease in the 
content of squalen© (which can be considered to be 
an intermediate product of ergosta-5-7-dienol) by 
comparison with table 1 (data for S. cerevisiae 
GRFtHIura3) and table 3 (data for s. cerevisiae 
GRFtHIura3erg5) ; <ii) nor has it been shown that 
the aforementioned problem has been solved by an 
increase in lanosterol-C14-demethylase (ergll) or 
squalene synthetase (erg9) activity (in addition 
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to a reduction in erg5 activity and an increase in 
hmg reductase activity) . 

4,6 In consequence, claims 1-30 do not involve an 
inventive step (PCT Article 33 (3) ) . 
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upon this difference, Weber does not anticipate the present application. The Applicant 
respectfully requests that this rejection be withdrawn. 

CONCLUSION 

In light of the foregoing Amendments and remarks, it is believed that the rejections 
and objections of record have been obviated, and allowance of this application is respectfully 
solicited. If a telephone conference would facilitate examination of this application in any 
way, the examiner is invited to contact the applicant's attorney at (619) 846^4850. The 
Examiner's consideration of this matter is gratefully acknowledged. 

FEES 

The Commissioner is authorized to charge the fees for a petition for a three^ionth 
extension of time for a large entity ($1020) and any other fees deemed necessary in 
connection with the above-application to Deposit Account No. 50-1047. 
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Mayer & Williams PC 
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WestfieId,NJ 07090 

Tel.: 619-846-4850 

Fax: (908)518-7795 



Respectfully submitted, 



Ann A. Wi< 



orek, Esq. 
Registration No. 46,087 
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and Tiriyi^ymce v ja Facsimile to; 57^273-8300 on 



. — Marfaffe Sgadfltj 

(Printed Name of Person Sending Correspondence) 
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